Legal contracts, such as employment or lease agreements, are important documents as they govern the obligations and entitlements of the various contracting parties. However, these documents are typically long and written in legalese resulting in lots of manual hours spent in understanding them. In this paper, we address the task of summarizing legal contracts for each of the contracting parties, to enable faster reviewing and improved understanding of them. Specifically, we collect a dataset consisting of pairwise importance comparison annotations by legal experts for ~293K sentence pairs from lease agreements. We propose a novel extractive summarization system to automatically produce a summary consisting of the most important obligations, entitlements, and prohibitions in a contract. It consists of two modules: (1) a content categorize to identify sentences containing each of the categories (i.e., obligation, entitlement, and prohibition) for a party, and (2) an importance ranker to compare the importance among sentences of each category for a party to obtain a ranked list. The final summary is produced by selecting the most important sentences of a category for each of the parties. We demonstrate the effectiveness of our proposed system by comparing it against several text ranking baselines via automatic and human evaluation.
translated by 谷歌翻译
已显示在文本上训练的NLP模型可以重现人类的刻板印象,当系统大规模部署系统时,可以放大边缘化组的危害。我们适应了Koch等人的代理 - 信号 - 局势(ABC)刻板印象模型。(2016年)从社会心理学作为系统研究和发现语言模型(LMS)中刻板印象群体特征关联的框架。我们介绍了用于测量语言模型的刻板印象关联的灵敏度测试(集合)。为了使用ABC模型评估集合和其他措施,我们从美国受试者那里收集小组特征判断,以与英语LM刻板印象进行比较。最后,我们扩展了此框架以测量相互切换身份的LM定型观念。
translated by 谷歌翻译
脚本知识(Schank和Abelson,1975年)长期以来一直被认为是对语言理解至关重要的,因为它可以帮助在叙述中填写未定的信息。然而,由于报告偏见(Gordon和Van Durme,2013年),这些知识昂贵,并且难以从文本中诱导文本昂贵。在这项工作中,我们对科学问题感兴趣,这些问题是通过预先接受训练的生成语言模型(LMS)提供和可访问是否存在显式脚本知识。为此,我们介绍了在自然语言提示形式的情况下生成完整事件序列描述(ESDS)的任务。在零拍摄探测实验中,我们发现生成LMS产生差,主要省略,无关紧要,重复或错误的事件。为了解决这个问题,我们提出了一种基于管道的脚本感应框架(SIF),可以为看不见场景产生良好的质量ESDS(例如,烘烤蛋糕)。 SIF是一个双阶段的框架,在第一阶段的一小集ESD示例上进行微调LM。在第二阶段,使用基于Roberta的模型为取消操作场景生成的ESD来过滤无关的事件,删除重复,并重新排序时间上未按顺序的事件。通过自动和手动评估,我们证明SIF在微调的LM上产生了大量的改进(1美元 - 3美元的蓝点)。然而,手动分析表明,有很好的改进空间,为诱导脚本知识提供了新的研究方向。
translated by 谷歌翻译
事件模式是结构化知识来源,定义了典型的真实情景(例如,去机场)。我们基于新颖的脚本感应系统和允许非专家“程序”复杂事件结构的精心设计的界面,提出了一个框架的架构库的高效循环构造框架。与此工作相关联我们释放了一个模式库:232的机器可读资源,详细的事件模式,其中每个项目都描述了其相关子事件结构(方案中发生的事情)的不同典型方案,参与者(谁扮演在场景中的角色),每个参与者的细粒度打字,以及它们之间的隐含关系约束。我们制作我们的模式库和在线提供的架构界面。
translated by 谷歌翻译
The ''Propose-Test-Release'' (PTR) framework is a classic recipe for designing differentially private (DP) algorithms that are data-adaptive, i.e. those that add less noise when the input dataset is nice. We extend PTR to a more general setting by privately testing data-dependent privacy losses rather than local sensitivity, hence making it applicable beyond the standard noise-adding mechanisms, e.g. to queries with unbounded or undefined sensitivity. We demonstrate the versatility of generalized PTR using private linear regression as a case study. Additionally, we apply our algorithm to solve an open problem from ''Private Aggregation of Teacher Ensembles (PATE)'' -- privately releasing the entire model with a delicate data-dependent analysis.
translated by 谷歌翻译
When testing conditions differ from those represented in training data, so-called out-of-distribution (OOD) inputs can mar the reliability of black-box learned components in the modern robot autonomy stack. Therefore, coping with OOD data is an important challenge on the path towards trustworthy learning-enabled open-world autonomy. In this paper, we aim to demystify the topic of OOD data and its associated challenges in the context of data-driven robotic systems, drawing connections to emerging paradigms in the ML community that study the effect of OOD data on learned models in isolation. We argue that as roboticists, we should reason about the overall system-level competence of a robot as it performs tasks in OOD conditions. We highlight key research questions around this system-level view of OOD problems to guide future research toward safe and reliable learning-enabled autonomy.
translated by 谷歌翻译
One of the major challenges of machine translation (MT) is ambiguity, which can in some cases be resolved by accompanying context such as an image. However, recent work in multimodal MT (MMT) has shown that obtaining improvements from images is challenging, limited not only by the difficulty of building effective cross-modal representations but also by the lack of specific evaluation and training data. We present a new MMT approach based on a strong text-only MT model, which uses neural adapters and a novel guided self-attention mechanism and which is jointly trained on both visual masking and MMT. We also release CoMMuTE, a Contrastive Multilingual Multimodal Translation Evaluation dataset, composed of ambiguous sentences and their possible translations, accompanied by disambiguating images corresponding to each translation. Our approach obtains competitive results over strong text-only models on standard English-to-French benchmarks and outperforms these baselines and state-of-the-art MMT systems with a large margin on our contrastive test set.
translated by 谷歌翻译
The UK COVID-19 Vocal Audio Dataset is designed for the training and evaluation of machine learning models that classify SARS-CoV-2 infection status or associated respiratory symptoms using vocal audio. The UK Health Security Agency recruited voluntary participants through the national Test and Trace programme and the REACT-1 survey in England from March 2021 to March 2022, during dominant transmission of the Alpha and Delta SARS-CoV-2 variants and some Omicron variant sublineages. Audio recordings of volitional coughs, exhalations, and speech were collected in the 'Speak up to help beat coronavirus' digital survey alongside demographic, self-reported symptom and respiratory condition data, and linked to SARS-CoV-2 test results. The UK COVID-19 Vocal Audio Dataset represents the largest collection of SARS-CoV-2 PCR-referenced audio recordings to date. PCR results were linked to 70,794 of 72,999 participants and 24,155 of 25,776 positive cases. Respiratory symptoms were reported by 45.62% of participants. This dataset has additional potential uses for bioacoustics research, with 11.30% participants reporting asthma, and 27.20% with linked influenza PCR test results.
translated by 谷歌翻译
Importance: Social determinants of health (SDOH) are known to be associated with increased risk of suicidal behaviors, but few studies utilized SDOH from unstructured electronic health record (EHR) notes. Objective: To investigate associations between suicide and recent SDOH, identified using structured and unstructured data. Design: Nested case-control study. Setting: EHR data from the US Veterans Health Administration (VHA). Participants: 6,122,785 Veterans who received care in the US VHA between October 1, 2010, and September 30, 2015. Exposures: Occurrence of SDOH over a maximum span of two years compared with no occurrence of SDOH. Main Outcomes and Measures: Cases of suicide deaths were matched with 4 controls on birth year, cohort entry date, sex, and duration of follow-up. We developed an NLP system to extract SDOH from unstructured notes. Structured data, NLP on unstructured data, and combining them yielded seven, eight and nine SDOH respectively. Adjusted odds ratios (aORs) and 95% confidence intervals (CIs) were estimated using conditional logistic regression. Results: In our cohort, 8,821 Veterans committed suicide during 23,725,382 person-years of follow-up (incidence rate 37.18 /100,000 person-years). Our cohort was mostly male (92.23%) and white (76.99%). Across the six common SDOH as covariates, NLP-extracted SDOH, on average, covered 84.38% of all SDOH occurrences. All SDOH, measured by structured data and NLP, were significantly associated with increased risk of suicide. The SDOH with the largest effects was legal problems (aOR=2.67, 95% CI=2.46-2.89), followed by violence (aOR=2.26, 95% CI=2.11-2.43). NLP-extracted and structured SDOH were also associated with suicide. Conclusions and Relevance: NLP-extracted SDOH were always significantly associated with increased risk of suicide among Veterans, suggesting the potential of NLP in public health studies.
translated by 谷歌翻译
In spite of machine learning's rapid growth, its engineering support is scattered in many forms, and tends to favor certain engineering stages, stakeholders, and evaluation preferences. We envision a capability-based framework, which uses fine-grained specifications for ML model behaviors to unite existing efforts towards better ML engineering. We use concrete scenarios (model design, debugging, and maintenance) to articulate capabilities' broad applications across various different dimensions, and their impact on building safer, more generalizable and more trustworthy models that reflect human needs. Through preliminary experiments, we show capabilities' potential for reflecting model generalizability, which can provide guidance for ML engineering process. We discuss challenges and opportunities for capabilities' integration into ML engineering.
translated by 谷歌翻译